Skip to content

Conversation

@gfyrag
Copy link
Contributor

@gfyrag gfyrag commented Dec 1, 2025

Summary by cubic

Speed up v2.3 migration scripts by switching to row-number batching, adding targeted indexes, and narrowing updates. This cuts runtime and lock contention on large datasets; the heavy PCV rebuild step is disabled in this backport.

  • Refactors
    • Use window row_number + range filtering instead of offset/limit across migrations (17, 18, 28, 29, 31, 32, 33).
    • Add temporary staging tables and indexes (with included columns) to accelerate joins and updates.
    • Limit writes (e.g., set inserted_at only when null; process only NEW_TRANSACTION/REVERTED_TRANSACTION logs).
    • Disable the costly PCV recompute in migration 27 to avoid long-running work.

Written for commit 38fc49c. Summary will update automatically on new commits.

@gfyrag gfyrag requested a review from a team as a code owner December 1, 2025 13:37
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Multiple storage/bucket SQL migrations were rewritten to use precomputed row_number-based batching and temporary tables: migration 17 switched to a staged transactions_ids table with batched commits and notifications; migration 18 moved to type-based filtering and row_number batching; migration 27 was replaced by a no-op notice; migrations 28–33 adopt row_number paging and adjusted indexes.

Changes

Cohort / File(s) Summary
Row-number pagination across multiple migrations
internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql, internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql, internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql, internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql, internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
Add row_number() into view/CTE outputs, replace OFFSET/LIMIT with WHERE row_number >= _offset AND row_number < _offset + _batch_size, and update indexes to use row_number for deterministic batching.
Staged batched update with temp table
internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
Replace correlated/CTE update with precomputed transactions_ids temporary table, create index and ANALYZE, iterate by row_number ranges in batches, perform explicit COMMIT after each batch, and emit progress notices; remove final NOT VALID constraint enforcement.
Type-based filtering & row_number batching
internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
Remove date-driven migration timestamp, add row_number to logs_transactions, switch filter to type IN ('NEW_TRANSACTION','REVERTED_TRANSACTION'), reindex by row_number, and use row_number ranges for batch updates (only update when inserted_at IS NULL).
Disabled / no-op migration
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql
Replace previous DO block and batch/update logic with a single RAISE NOTICE 'Migration superseded by next migration', effectively disabling the migration's prior operational steps.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay special attention to migrations/17 (temp table creation, indexing, ANALYZE, commit points, and progress notices).
  • Verify correctness of new row_number() computations and updated indexes across migrations 28–33 to ensure deterministic ordering matches prior semantics.
  • Confirm migrations/18 update condition (inserted_at IS NULL) and type-based filtering do not miss intended rows.
  • Check that disabling migration 27 via a notice is intentional and documented.

Possibly related PRs

Suggested reviewers

  • flemzord

Poem

🐰
I bounced through SQL, row numbers in tow,
No OFFSET left for me to tow,
Temp tables hum, commits in a line,
Batches march forward, tidy and fine,
A carrot for progress — migrations shine!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: improve migration performances' directly reflects the main objective of optimizing migration scripts by refactoring batching logic and adding indexes.
Description check ✅ Passed The description comprehensively details the performance improvements: row-number batching, staging tables, targeted indexes, narrowed writes, and disabled PCV recompute across seven migrations.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch backport/v2.3/speedup-migration

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ee80d0d and 38fc49c.

📒 Files selected for processing (8)
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1 hunks)
  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (2 hunks)
  • internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1 hunks)
  • internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql (3 hunks)
  • internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql (2 hunks)
  • internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql (2 hunks)
  • internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (2 hunks)
  • internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • internal/storage/bucket/migrations/29-fix-invalid-metadata-on-reverts/up.sql
  • internal/storage/bucket/migrations/31-fix-transaction-updated-at/up.sql
  • internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.
📚 Learning: 2025-07-19T18:35:34.260Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1017
File: internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql:30-31
Timestamp: 2025-07-19T18:35:34.260Z
Learning: In Formance ledger migrations, SHARE ROW EXCLUSIVE locks on migration-specific temporary tables like moves_view are acceptable since these tables are dedicated for the migration. ACCESS SHARE locks on transactions table don't block writes, only schema changes.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
  • internal/storage/bucket/migrations/28-fix-pcv-missing-asset/up.sql
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-11-28T10:06:01.381Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 1167
File: internal/storage/bucket/migrations/43-fix-missing-inserted-at-in-log-data/up.sql:23-23
Timestamp: 2025-11-28T10:06:01.381Z
Learning: In PostgreSQL migrations for the Formance ledger, to ensure ISO 8601/RFC3339 date formatting when setting JSONB fields, use the pattern `to_jsonb(date)#>>'{}'` to extract the timestamp as a properly formatted string. This leverages JSONB's built-in ISO 8601 encoding. Using `date::text` would depend on the server's DateStyle setting and doesn't guarantee ISO 8601 format.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
📚 Learning: 2025-05-20T13:48:07.455Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 935
File: internal/controller/system/state_tracker.go:0-0
Timestamp: 2025-05-20T13:48:07.455Z
Learning: In the Formance ledger codebase, sequence reset queries with `select setval` don't require COALESCE around max(id) even for brand new ledgers, as the system handles this case properly.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
  • internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql
  • internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql
📚 Learning: 2025-04-29T11:24:28.923Z
Learnt from: gfyrag
Repo: formancehq/ledger PR: 892
File: internal/controller/ledger/controller_default.go:196-196
Timestamp: 2025-04-29T11:24:28.923Z
Learning: In the ledger Import function, it's critical to maintain proper log ID tracking by updating lastLogID with the current log.ID after each processed log, rather than setting it to nil. This ensures the system can properly validate the ordering of logs and prevent duplicate or out-of-order processing, which is essential for maintaining data integrity in the ledger.

Applied to files:

  • internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: cubic · AI code reviewer
  • GitHub Check: Tests
  • GitHub Check: Dirty
🔇 Additional comments (4)
internal/storage/bucket/migrations/27-fix-invalid-pcv/up.sql (1)

1-5: Strategic simplification: NO-OP notice for backport.

The migration has been intentionally reduced to a notice placeholder to disable the expensive PCV recompute in this backport, aligning with the performance optimization goal. No functional issues.

internal/storage/bucket/migrations/32-fix-log-data-for-reverted-transactions/up.sql (1)

1-67: Correct row_number pagination and proper resource cleanup.

Pagination bounds are correctly implemented: starting with _offset = 0 and using where row_number > _offset and row_number <= _offset + _batch_size ensures consistent batches of 1000 rows (rows 1–1000, 1001–2000, etc.). Temporary table lifecycle is properly managed: created, indexed, processed in batches with commits, and cleaned up.

internal/storage/bucket/migrations/18-transactions-fill-inserted-at/up.sql (1)

21-42: Correct pagination bounds, narrowed writes, and efficient index strategy.

Pagination correctly uses where row_number > i and row_number <= i + _batch_size with 0-based loop initialization, ensuring consistent 1000-row batches (1–1000, 1001–2000, etc.). The inserted_at is null condition (line 42) narrows writes to rows needing updates, improving idempotency and performance. Type-based filtering (line 24) further reduces scope. The INCLUDE index (line 26) enables efficient index-only scans.

internal/storage/bucket/migrations/17-moves-fill-transaction-id/up.sql (1)

8-40: Excellent precomputed batching strategy with correct pagination and resource efficiency.

Precomputing the temporary table transactions_ids with a single upfront join (avoiding repeated scans or correlated subqueries) combined with analyze (line 17) ensures the query planner has accurate statistics. The loop correctly uses 1-based indexing (for i in 1.._max) paired with where row_number >= i and row_number < i + _batch_size, yielding consistent 1000-row batches (1–1000, 1001–2000, etc.). The INCLUDE index (line 15) enables efficient index-only scans. Explicit per-batch commits (line 36) reduce lock contention on large datasets, aligning with the PR's performance and resilience goals.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Dec 1, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.88%. Comparing base (a39f414) to head (38fc49c).
⚠️ Report is 2 commits behind head on release/v2.3.

Additional details and impacted files
@@               Coverage Diff                @@
##           release/v2.3    #1174      +/-   ##
================================================
+ Coverage         81.84%   81.88%   +0.04%     
================================================
  Files               187      187              
  Lines              9033     9033              
================================================
+ Hits               7393     7397       +4     
+ Misses             1205     1203       -2     
+ Partials            435      433       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gfyrag gfyrag force-pushed the backport/v2.3/speedup-migration branch from ee80d0d to 73f6d6d Compare December 1, 2025 13:42
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 8 files

Prompt for AI agents (all 1 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql">

<violation number="1" location="internal/storage/bucket/migrations/33-fix-invalid-date-format/up.sql:37">
P2: `row_number()` is 1-based, so the new predicate drops one row per chunk and forces an extra loop whenever the total count is a multiple of `_batch_size`. Use `&gt; _offset` / `&lt;= _offset + _batch_size` to match the original pagination semantics.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

@gfyrag gfyrag merged commit 6510211 into release/v2.3 Dec 3, 2025
11 checks passed
@gfyrag gfyrag deleted the backport/v2.3/speedup-migration branch December 3, 2025 13:57
@gfyrag gfyrag mentioned this pull request Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants